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Abstract 

Due to its constrained support, the Dirichlet distribution is uniquely suited to many 
applications. The constraints that make it powerful, however, can also hinder practical 
implementations, particularly those utilizing Markov Chain Monte Carlo (MCMC) 
techniques such as Hamiltonian Monte Carlo. I demonstrate a series of transformations 
that reshape the canonical Dirichlet distribution into a form much more amenable to 
MCMC algorithms. 



The Dirichlet Distribution 

Given m random variables, x, with the constrained support 

< Xi < 1 

Xi = l ' 



i=i 

the Dirichlet distribution [IJ [2] is defined by the parameterized probability density 

Dir Ma) - [ ^ i=l Q * j • TT Z'" 1 

where the a are positive-valued reals and T is the usual gamma function. 

Because of its distinctive support, the Dirichlet distribution is particularly well-suited 
for modeling the allocation of conserved quantities such as probability. The distribution 
becomes invaluable when studying categorical problems: inference with histograms a per- 
vasive example. 
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Generating Dirichlet Samples 



Sampling from the Dirichlet distribution is made feasible due to a convenient property of 
the Gamma distribution [3]. An ensemble of Gamma variates, 



Given the efficiency of modern Gamma generators [3], the generation of independent Dirich- 
let variates is particularly undemanding. 

The same property admits a parallel Markov Chain Monte Carlo approach. Here sep- 
arate chains generate the independent Gamma variates, which are then normalized to 
produce the desired Dirichlet sample. Because the chains interact only in the normaliza- 
tion, however, their evolutions are uncorrelated and consequently useless to algorithms that 
rely on the local correlations of the distribution such as nested sampling [5j. Moreover, the 
need to interrupt the evolution to normalize prohibits the generalization to chains sampling 
from a posterior distribution predicated on a Dirichlet prior. 

Creating a Markov chain that samples directly from the Dirichlet distribution is compli- 
cated by the constraints, as the Dirichlet variates manifest not in m dimensions but rather 
in a m — 1 dimensional submanifold known as a simplex. Unless the Markov transitions 
accommodate the constraints directly, proposed samples will fall outside of the simplex 
(with probability 1) and the chain will be unable to progress beyond an initial seed. 

By parameterizing the simplex directly, the constraints are implicitly taken into account 
and the chain will have no problem exploring the full support of the Dirichlet distribution. 
Constructing a systematic map between the original m dimensional manifold and the sim- 
plex is straightforward, if a bit ungainly, but real difficulties begin to arise when considering 
the boundary of the simplex. 

Because the simplex is the inclusive volume of a m— 1 dimensional polytope, the support 
is bounded by a surface of piecewise faces, the number of which grows exponentially with the 
dimension of the original distribution. Modeling this complex boundary in terms of simplex 
coordinates is awkward at best, especially when appealing to more sophisticated algorithms 
like constrained Hamiltonian Monte Carlo [H [7] that require a careful description of the 
boundary geometry. 



Ui ~ G&(ui\a i: (3i) 



follows a Dirichlet distribution upon normalization, 




2 



Smoothing Out The Simplex 

A simple change of variables dramatically simplifies the structure of the simplex. 
Taking 

Hi = x ii 

the original Dirichlet distribution becomes 



Dir (via) = 2 m ^S^ • TT V 2 " 1 " 1 



with the support 

< Vl < 1 

m 



i=i 



The quadratic constraint defines a substantially simpler submanifold: the surface of an m 
dimensional hypersphere within the positive orthant. 
Transforming to hyperspherical coordinates [8], 



Ui = r JJ sin k \ ■ 



\k=l 



cos 0{, i < m 
1, i = m ' 



the constrained support becomes 



7T 

< 6>; < - 
- - 2 

r 2 = 1 

with the distribution 

m— 1 



Dir (r, fl|a) = 2 m ^N?^ • r 25 "' 1 TT (cos ^) 2 ° ! ~ 1 (sin^) 25 '" 1 



i=l 
where 

m 
fc=i+l 
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Marginalization over the radial coordinate is immediate due to the constraint, giving a 
distribution for the hyperspherical angles alone 



r (E7=i a i) 



TO— 1 



Dir (0|a) = 2 



iTO — 1 



• n (cos^) 2 " 1 " 1 ^^) 



2&i-l 



with the remaining constraints 



< 6i < -. 
~ ~ 2 



By distorting the original Dirichlet distribution, the simplex has become the constrained 
surface of a hypersphere readily parameterized with hyperspherical coordinates. The com- 
plexity of the hyperspherical boundary scales linearly with dimension and the entire sub- 
manifold becomes particularly accomodating to constrained Hamiltonian Monte Carlo. 

Differentiating on the Hypersphere 

The ability to run a Markov chain that samples directly from the Dirichlet distribution 
accommodates algorithms such as nested sampling, or extending the chain to sample from 
a posterior based on a Dirichlet prior. These applications, however, require the gradient of 
a function, in these cases the log likelihood, with respect to the hyperspherical coordinates. 
Appealing to the chain rule, 




j=l k=l 





TO 



Substituting the hyperspherical derivative, 




0, i>j 
-yj- tan 6i, i = j 
yj/temOi, i<j 
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gives 



Of (x) 



1 j=i+l 



df (x) 

8Xn 



tan 8i 



or, in terms of the original variables, 



g/(x) 

0& 



— x i tan6'i+ > — — 



3=i+l 



tan 0, 



Once the original gradient, df/dxi, has been computed, the hyperspherical derivative 
requires only a few additional tangent evaluations. This additional burden can be elimi- 
nated entirely by storing the sine and cosine evaluations when projecting the hyperspherical 
coordinates 9 to the original coordinates x beforehand. 
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